A large-vocabulary taiwanese (MIN-NAN) multi-syllabic word recognition system based upon right-context-dependent phones with state clustering by acoustic decision tree
نویسندگان
چکیده
In this paper, we apply context dependent phonetic modeling on the task of large vocabulary (with 20 thousand words) Taiwanese multi-syllabic word recognition. Considering the phonetic characteristics of Taiwanese, the right context dependent (RCD) phones instead of the general tri-phones are used. The RCDs are further clustered at the sub-phone or state level using a decision tree with a set of context-split questions specially designed for Taiwanese speech according to the acoustic/phonetic knowledge. For the speaker dependent case, 7.18% word error rate is achieved. A real-time prototype system implemented on a Pentium-II personal computer running MSWindows95/NT is also shown to validate the approaches proposed here.
منابع مشابه
Speaker Independent Acoustic Modeling for Large Vocabulary Bi-lingual Taiwanese/mandarin Continuous Speech Recognition
In this paper, we describe the acoustic modelling technique for a bi-lingual Taiwanese /Mandarin speech recognition system, which deals with speaker independent continuous speech based on HMMs clustered by an acoustic phonetic decision tree. A bi-lingual recogniser with a bilingual database of 120 people was built. The vocabulary size of this system is up to 40 thousands. Unigram, bi-gram, and ...
متن کاملA bi-lingual Mandarin/taiwanese (min-nan), large vocabulary, continuous speech recognition system based on the tong-yong phonetic alphabet (TYPA)
In this paper, we describe the first Mandarin/Taiwanese (Min-nan) bi-lingual, continuous speech recognition system for large vocabulary or vocabulary-independent applications. A phonetic transcription system called Tong-yong Phonetic Alphabet (TYPA) is described and used to transcribe the bilingual Mandarin/Taiwanese lexicons. The Right-ContextDependent (RCD) phonetic continuous-density Hidden ...
متن کاملDecision tree state clustering with word and syllable features
In large vocabulary continuous speech recognition, decision trees are widely used to cluster triphone states. In addition to commonly used phonetically based questions, others have proposed additional questions such as phone position within word or syllable. This paper examines using the word or syllable context itself as a feature in the decision tree, providing an elegant way of introducing w...
متن کاملLarge vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling
A large vocabulary Taiwanese (Min-nan) speech recognition system is described in this paper. Due to the severe multiple pronunciation phenomenon in Taiwanese partly caused by tone sandhi, a statistical pronunciation modeling technique based on tonal features is used. This system is speaker independent. It was trained by a bi-lingual Mandarin/Taiwanese speech corpus to alleviate the lack of pure...
متن کاملModelling and decoding of crossword context dependent phones in the Philips large vocabulary continuous speech recognition system
The performance of the Philips system for large vocabulary continuous speech recognition has been improved signi cantly by crossword N-phone modelling, enhanced clustering of HMM-states during training, consistent handling of untrained HMM-states during decoding and a new e cient crossword N-phone M-gram decoding strategy. We report word error rate reductions of up to 18% on various ARPA test s...
متن کامل